Multi-stream language identification using data-driven dependency selection

نویسندگان

Sonia Parandekar

Katrin Kirchhoff

چکیده

The most widespread approach to automatic language identification in the past has been the statistical modeling of phone sequences extracted from speech signals. Recently, we have developed an alternative approach to LID based on n-gram modeling of parallel streams of articulatory features, which was shown to have advantages over phone-based systems on short test signals whereas the latter achieved a higher accuracy on longer signals. Additionally, phone and feature streams can be combined to achieve maximum performance. Within this “multi-stream” framework two types of statistical dependencies need to be modeled: (a) dependencies between symbols in individual streams and (b) dependencies between symbols in different streams. The space of possible dependencies is typically too large to be searched exhaustively. In this paper, we explore the use of genetic algorithms as a method for data-driven dependency selection. The result is a general framework for the discovery and modeling of dependencies between multiple information sources expressed as sequences of symbols, which has implications for other fields beyond language identification, such as speaker identification or language modeling.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

تأثیر ساخت‌واژه‌ها در تجزیه وابستگی زبان فارسی

Data-driven systems can be adapted to different languages and domains easily. Using this trend in dependency parsing was lead to introduce data-driven approaches. Existence of appreciate corpora that contain sentences and theirs associated dependency trees are the only pre-requirement in data-driven approaches. Despite obtaining high accurate results for dependency parsing task in English langu...

متن کامل

Feature Engineering in Persian Dependency Parser

Dependency parser is one of the most important fundamental tools in the natural language processing, which extracts structure of sentences and determines the relations between words based on the dependency grammar. The dependency parser is proper for free order languages, such as Persian. In this paper, data-driven dependency parser has been developed with the help of phrase-structure parser fo...

متن کامل

Pronominal Reference Type Identification and Event Anaphora Resolution for Hindi

In this paper, we present hybrid approaches for pronominal reference type (abstract or concrete) identification and event anaphora resolution for Hindi. Pronominal reference type identification is one of the important parts for any anaphora resolution system as it helps anaphora resolver in optimal feature selection based on pronominal reference types. We use language specific rules and feature...

متن کامل

Global Inference and Learning Algorithms for Multi-Lingual Dependency Parsing

This paper gives an overview of the work of McDonald et al. (McDonald et al. 2005a, 2005b; McDonald and Pereira 2006;McDonald et al. 2006) on global inference and learning algorithms for data-driven dependency parsing. Further details can be found in the thesis of McDonald (McDonald 2006). This paper is primarily intended for the audience of the ESSLLI 2007 course on data-driven dependency pars...

متن کامل

The Influence of Data-Driven Exercises Through Using a Computer Program on Vocabulary Improvement in an EFL Context

The present study was conducted to evaluate data driven learning (DDL) combined with Computer Assisted Language Learning (CALL) as an approach to improving vocabulary knowledge of Iranian postgraduates majoring in teaching English, English literature and translation. The purpose was to help language learners get familiar with DDL as a student-centered method taking advantage of a computer progr...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

Multi-stream language identification using data-driven dependency selection

نویسندگان

چکیده

منابع مشابه

تأثیر ساخت‌واژه‌ها در تجزیه وابستگی زبان فارسی

Feature Engineering in Persian Dependency Parser

Pronominal Reference Type Identification and Event Anaphora Resolution for Hindi

Global Inference and Learning Algorithms for Multi-Lingual Dependency Parsing

The Influence of Data-Driven Exercises Through Using a Computer Program on Vocabulary Improvement in an EFL Context

عنوان ژورنال:

اشتراک گذاری